Skip to content

fix: avoid unicode filepath suffix panic#393

Merged
dmtrKovalenko merged 3 commits into
dmtrKovalenko:mainfrom
tmdgusya:fix/unicode-filepath-suffix-panic
Apr 21, 2026
Merged

fix: avoid unicode filepath suffix panic#393
dmtrKovalenko merged 3 commits into
dmtrKovalenko:mainfrom
tmdgusya:fix/unicode-filepath-suffix-panic

Conversation

@tmdgusya
Copy link
Copy Markdown
Contributor

Summary

  • fix a remaining UTF-8 boundary panic in path_ends_with_suffix()
  • return false instead of panicking when a byte-derived suffix offset lands inside a multibyte character
  • add regression tests for the helper and for apply_constraints(Constraint::FilePath(...))

Similar PR / duplicate check

I checked existing PRs before opening this:

  • closest prior fix: fix: Unicode segmentation crash #373 (fix: Unicode segmentation crash)
  • no open PR currently covers this remaining Constraint::FilePath / path_ends_with_suffix() panic path

This PR is intentionally narrow: it fixes the unchecked path[start..] slice in crates/fff-core/src/constraints.rs without changing matching semantics.

Root cause

path_ends_with_suffix() computed:

let start = path.len() - suffix.len();

and then sliced with:

path[start..]

start is a byte offset, not guaranteed to be a UTF-8 char boundary. For Unicode filenames, a non-matching suffix can make start land inside a multibyte codepoint, which panics before constraint filtering can return false.

Fix

Use path.get(start..) instead of unchecked indexing:

  • if start is not a valid char boundary, return false
  • otherwise preserve the existing ASCII-insensitive suffix comparison and / boundary behavior

Verification

I avoided using any user-specific filename in tests and instead used synthetic Unicode fixture names.

Reproduction guard added

New tests in crates/fff-core/src/constraints.rs:

  • test_path_ends_with_suffix_does_not_panic_on_unicode_suffix
  • test_apply_constraints_file_path_with_unicode_suffix
  • test_path_contains_segment_does_not_panic_on_unicode_segment

The important regression case uses a synthetic filename such as:

  • data/유니코드_파일_테스트.csv

and a deliberately non-matching suffix that would previously place the byte offset in the middle of a multibyte character.

Commands run

cargo test -p fff-search constraints::tests -- --nocapture

Result

All constraint tests pass locally after the fix, including the new Unicode regression coverage.

Scope notes

  • no parser changes
  • no matching behavior expansion
  • no Unicode normalization/case-folding changes
  • only panic prevention for valid UTF-8 input in the file path suffix constraint path

@dmtrKovalenko
Copy link
Copy Markdown
Owner

@copilot resolve the merge conflicts in this pull request

dmtrKovalenko and others added 2 commits April 19, 2026 09:55
The previous fix (a09292e) only guarded path_ends_with_suffix with
path.get(start..), but three problems remained:

1. path_ends_with_suffix: path_bytes[start - 1] reads inside a
   multi-byte char when start is a valid boundary but start-1 is not.
   Fixed by scanning backward to find the preceding ASCII byte.

2. path_contains_segment: path[..segment_len] and path[start..end]
   slice at non-char-boundary offsets when segment is ASCII but the
   path contains multi-byte UTF-8 (Korean, etc).
   Fixed with is_char_boundary() checks before each slice.

3. file_has_extension: same byte-offset issue for dot_pos.
   Fixed with is_char_boundary() check.

Adds regression tests with the exact Korean filenames that caused
panics (커리큘럼, 세부_커리큘럼_최종, 설치-및-기본-사용, etc).
Merges upstream unicode tests (apostrophe, narrow-space mismatches).
@tmdgusya tmdgusya force-pushed the fix/unicode-filepath-suffix-panic branch from 91cd463 to f067e82 Compare April 21, 2026 12:56
@dmtrKovalenko dmtrKovalenko merged commit a24cf55 into dmtrKovalenko:main Apr 21, 2026
40 checks passed
tmdgusya added a commit to tmdgusya/roach-pi that referenced this pull request Apr 22, 2026
0.6.0 crashes the fff-bg indexing thread on UTF-8 multibyte filenames
(e.g. Korean, emoji) due to a non-char-boundary &str slice in
path_ends_with_suffix() panicking across the FFI boundary. Fixed
upstream in dmtrKovalenko/fff#393; pin nightly until a stable
0.6.2 is cut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
github-actions Bot pushed a commit to tmdgusya/roach-pi that referenced this pull request Apr 22, 2026
## [1.9.5](v1.9.4...v1.9.5) (2026-04-22)

### Bug Fixes

* **fff:** pin @ff-labs/fff-node to 0.6.2-nightly.acd2f0c ([4470b66](4470b66)), closes [dmtrKovalenko/fff#393](dmtrKovalenko/fff#393)

### Miscellaneous

* upgrade @ff-labs/fff-node 0.5.2 → 0.6.0 ([3b1ea47](3b1ea47))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants